Home > Science & Math > Mathematics

Introduction to Data Analysis and Graphical Presentation in Biostatistics with R by Thomas W. MacFarland

Author:Thomas W. MacFarland , Date: November 10, 2020 ,Views: 155

Introduction to Data Analysis and Graphical Presentation in Biostatistics with R by Thomas W. MacFarland

Author:Thomas W. MacFarland
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham

5.4 Conduct a Visual Data Check

With the data in proper format, it would be common to immediately attempt the appropriate inferential analyses, Oneway ANOVA for this lesson. However, it is best to first prepare a few graphical displays of the data and to then reinforce comprehension of the data with descriptive statistics and measures of central tendency.

The summary() function, min() function, and max() function are all certainly useful for data checking, but there are also many advantages to a visual data check process. In this case, simple plots can be very helpful in looking for data that may be either illogical or out-of-range. These initial plots will be, by design, simple and should be considered throwaways as they are intended only for initial diagnostic purposes. More complex figures, often of publishable quality can then be prepared from these initial throwaway graphics, by careful selection of functions and arguments.

Although the emphasis in this lesson is on Oneway ANOVA for the factor-type object variable F2b (five breakout groups) and the numeric-type object variable M1 (values can range from 0.00 to 600.00), a simple graphic will be prepared for each variable, largely as a quality assurance check against the entire dataset. Experienced researchers practice quality assurance in multiple ways and at multiple times.

names(BioSpmen.df) # Confirm all object variables.

par(ask=TRUE)

plot(BioSpmen.df$ID,

main="BioSpmen.df$ID Visual Data Check")

par(ask=TRUE)

plot(density(BioSpmen.df$M1,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M1",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M2,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M2",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3a,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3a",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3b,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3b",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(density(BioSpmen.df$M3c,

na.rm=TRUE), # Required for the density() function

main="Density Plot of M3c",

lwd=6, col="red", font.axis=2, font.lab=2)

par(ask=TRUE)

plot(BioSpmen.df$F1.recode,

main="BioSpmen.df$F1.recode Visual Data Check")

par(ask=TRUE)

plot(BioSpmen.df$F2a.recode,

main="BioSpmen.df$F2a.recode Visual Data Check")

par(ask=TRUE)

plot(BioSpmen.df$F2b.recode,

main="BioSpmen.df$F2b.recode Visual Data Check",

col=gray(4:0/4))

# Note the gray shading scheme and how the values range

# from 0 to 4 and not 1 to 5. Counts that begin with 0

# represent a common means of counting in computer science.

The purpose of these initial plots is to gain a general sense of the data and to equally look for outliers. In an attempt to look for outliers, the ylim argument has been avoided, so that all data are plotted. Extreme values may or may not be outliers, but they are certainly interesting and demand attention.

This sample lesson has been designed to look into the nature of the numeric-type object variable M1 and the factor-type object variable F2b. Given the nature of M1 values, it may also be a good idea to supplement the plot(density()) function with the hist() function and the boxplot() function, to gain a another view of the continuous values for this object variable. Although object variable M1 does not show perfect normal distribution along a bell-shaped curve, it is assumed that the distribution of M1 approximates those conditions needed for correct use of Oneway ANOVA. Then other functions used in the lattice package and the sm package may have potential use, to further explain how data are organized.

par(ask=TRUE)

hist(BioSpmen.df$M1,

main="BioSpmen.df$M1 Visual Data Check (Histogram)",

font=2, # Bold text

cex.lab=1.15, # Large font

col="red") # Vibrant color

par(ask=TRUE)

boxplot(BioSpmen.

Download

Introduction to Data Analysis and Graphical Presentation in Biostatistics with R by Thomas W. MacFarland.epub

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Agricultural Sciences	Archaeology
Astronomy & Space Science	Behavioral Sciences
Biological Sciences	Chemistry
Earth Sciences	Environment
Essays & Commentary	Evolution
Experiments, Instruments & Measurement	History & Philosophy
Mathematics	Nature & Ecology
Physics	Reference
Research	Science for Kids
Technology

Popular ebooks

The Black Swan by Nassim Nicholas Taleb(7129)
Modelling of Convective Heat and Mass Transfer in Rotating Flows by Igor V. Shevchuk(6440)
Weapons of Math Destruction by Cathy O'Neil(6279)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4742)
A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra) by Barbara Oakley(3307)
Descartes' Error by Antonio Damasio(3278)
Factfulness_Ten Reasons We're Wrong About the World_and Why Things Are Better Than You Think by Hans Rosling(3237)
TCP IP by Todd Lammle(3184)
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb(3123)
The Tyranny of Metrics by Jerry Z. Muller(3072)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(3070)
The Book of Numbers by Peter Bentley(2968)
The Great Unknown by Marcus du Sautoy(2694)
Once Upon an Algorithm by Martin Erwig(2649)
Easy Algebra Step-by-Step by Sandra Luna McCune(2635)
Lady Luck by Kristen Ashley(2579)
Police Exams Prep 2018-2019 by Kaplan Test Prep(2547)
Practical Guide To Principal Component Methods in R (Multivariate Analysis Book 2) by Alboukadel Kassambara(2544)
All Things Reconsidered by Bill Thompson III(2393)
Linear Time-Invariant Systems, Behaviors and Modules by Ulrich Oberst & Martin Scheicher & Ingrid Scheicher(2370)